Information science abstracts: Tracking the literature of information science. Part 2: A new taxonomy for information science
نویسندگان
چکیده
ing and indexing (A&I) services have long provided their users with aids to retrieving information from their databases. The initial products of many A&I services were printed publications organized in separate sections similar to chapters in a book and containing helpful author and subject indexes. Each section contained abstracts of articles on a broad subject. Users could therefore simply scan a section to find items of interest. The subject index portions of such A&I publications were generally based on controlled vocabulary terms developed at considerable cost and effort by information professionals with expertise in the discipline covered by the publication. Controlled vocabulary terms were frequently organized hierarchically into a thesaurus; the titles of the sections often became top-level terms (“main headings”) in the thesaurus. When A&I publications became available as searchable on-line databases, the controlled vocabularies and main headings were usually made searchable by the search service vendors. These added-value features provided valuable assistance to searchers, especially before the full text was available on-line. Many database producers expended considerable resources in training searchers (then mainly information professionals functioning in an intermediary mode) in how to use their thesauri. (One producer—the National Library of Medicine—initially required participation in a Author’s note: Donald T. Hawkins is the Editor-in-Chief, Information Science Abstracts; Signe E. Larson is a former Chair of Documentation Abstracts, Inc., and the previous owner of Information Science Abstracts; and Bari Q. Caton is Lead Abstractor-Indexer, Information Science Abstracts. Received October 9, 2002; revised January 6, 2003; accepted January 6, 2003 © 2003 Wiley Periodicals, Inc. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 54(8):771–781, 2003 training course before issuing a password to its MEDLINE system!) Taxonomies in the Web Environment One of the concerns a database producer must address is whether to expend the considerable resources required to produce a taxonomy and then to index the information using it. When commercial on-line databases first appeared, professional searchers clamored for the inclusion of thesauri and subject classifications to help them formulate their searches. Database producers responded favorably, and many of the early on-line databases included controlled vocabulary (descriptor) fields drawn from thesauri, as well as fields containing subject classification codes. The experiences of the past few years in the Internet world have made it clear that the effort to include such data is still worthwhile. When databases of information (particularly in full text) first became available on the Internet, many users felt that thesauri and subject classifications were no longer needed and would go the way of horseless carriages. After all, the theory ran, if everything is available on-line in full text, one would only need to enter the appropriate terms into a search engine, and the desired information would be retrieved. Inexperienced searchers quickly discovered the fallacy of this approach to information retrieval when they were faced with result sets numbering in the millions of hits, with the desired information buried somewhere in them. It was soon recognized that, far from fading away in the Web environment, subject classifications and thesauri have become more important than ever, and organizing information into subjects, or taxonomies, provided users with a significant improvement in retrieval. One of the earliest examples of the use of taxonomies was Yahoo!, which used trained information professionals to organize and categorize Web sites. Presently, in addition to a simple search box, most other search engines now provide an optional taxonomy that one can use for retrieval. The use of taxonomies has also spread to many large company intranets. Initial Considerations In a previous article, Hawkins (2001) described some of the history of Information Science Abstracts (ISA) and the development of a new definition of information science as well as a “map” of the field showing the subjects central to it and their relationships to those on the periphery. The work described here is an implementation and a practical application of the definition of information science and its relationship to related disciplines as outlined in Hawkins’ article. We have used his portrayal of the discipline to develop a new taxonomy for information science—and for ISA in particular—which accurately reflects the field as it exists today. These concepts have also been actively used to select relevant articles for abstracting in ISA. ISA’s subject classification scheme (i.e., taxonomy) and controlled vocabulary last underwent modification in 1993, when minor revisions were made. Since then, as the information field experienced many major and traumatic changes, the taxonomy became outdated. It contained many outmoded terms, and it was no longer able to accommodate the rapid technological and market changes affecting the information industry. ISA’s editors and indexers found it difficult to use; consequently, one can infer that users had even more difficulty. A major flaw in the previous taxonomy was that each top-level section contained a subsection entitled “General.” Over time, many abstracts had been placed in those subsections because the field had advanced and the taxonomy had not kept up with the changes. The result was that the taxonomy became virtually useless as an information retrieval tool. Figure 1, containing data taken from the master ISA production database, illustrates these observations. It shows the number of items posted in each section between 1992 and 2001. Note the wide variation in the number of items in the various sections. The section with the most postings (5.11, Searching and Retrieval) had over 2,700 items in it, and the one with the fewest (5.03, Supercomputers) had just four postings. Twelve sections had over 1,000 postings each, and 15 had fewer than 100 postings. These data provided strong evidence of the need for a new information science taxonomy, and the definition and map previously developed provided an excellent conceptual foundation for it. Taxonomy Development Philosophy and Methodology In these days of Internet search engines that are widely used by end users, the structure of a taxonomy must be clear and logical, and the taxonomy must be easy to use if there is to be any hope of its enjoying significant acceptance. It must also be dynamic, easily updated, and able to reflect rapid changes and technological advances. These principles are especially important for those users who scan the printed version of an A&I publication looking for items of interest. If a taxonomy is well constructed, its terms can be used to advantage by on-line searchers who would like to make meaningful use of the subject headings as broad limiting terms in searches. Indeed, as we have already noted, many of the commercial search hosts construct an inverted index of the main headings for this purpose. [For example, the Dialog system places numerical designations of the headings in the SC (Section Code) or MC (Main Heading Code) field, with corresponding textual equivalents in the SH (Section Heading) or MH (Main Heading) field.] These search terms are extremely useful when one wishes to 1 ISA, one of the leading A&I databases covering the field of information science, is published by Information Today, Inc. It is available in print from the publisher, and on-line through several search services. For further information, see http://www.infotoday.com/isa/default.htm. 772 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2003 FI G . 1. Su bj ec t co de di st ri bu tio n— fo rm er ta xo no m y. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2003 773 retrieve a large set of broadly related items and use it as a limiting criterion on other aspects of the search, in a “successive fractions” approach to searching (see Hawkins & Wagers (1983), for a discussion of this and other search techniques). With these general guiding principles in mind, we collected the initial candidate terms for the taxonomy. The following sources were useful: (1) Previous mappings of the information science field by Hawkins (2001) and others (see Hawkins’ article for references); (2) ISA’s then-current list of descriptors; (3) Lists of subject terms used by two other databases covering information science: R.R. Bowker’s Library and Information Science Abstracts (LISA) (now owned by Cambridge Scientific Abstracts), and H.W. Wilson’s Library Literature; (4) The ASIS Thesaurus of Information Science and Librarianship compiled by Milstead (1998); (5) Scope outlines for two of the central publications in information science, the Journal of the American Society for Information Science (JASIS) and the Annual Review of Information Science and Technology (ARIST); and (5) Appropriate sections of thesauri used by several other non-information science databases that contain information science terms (e.g., the INSPEC database). Candidate terms were grouped and organized into a preliminary taxonomy containing 13 main headings. Before being accepted, many of them were checked by searching for them in the ISA database and observing the number of postings retrieved as well as the context in which the terms were used. Validation of the Taxonomy Each of the authors of this article then independently indexed the December 1998 through May 1999 issues of ISA using the preliminary taxonomy. These six issues of ISA contain a total of 3,004 abstracts, so approximately 9,000 index assignments were made. Each abstract was assigned to the subject classification judged most relevant by the indexer. To simplify and speed this process, abstracts were given only a single classification number; in real life, of course, many abstracts have multiple classifications. Classification numbers were of the form x.y, where x is the “main heading” and y is the “subheading.” Abstracts for which the indexer felt that no appropriate classification existed were noted separately. The classification assignments were assembled into a Microsoft Access database. If at least two of the indexers agreed on the classification of an abstract, then that assigned classification number was accepted as its correct indexing. Abstracts that were placed in different classifications by all three indexers as well as those that had been noted separately were reviewed, and the different points of view were either reconciled or appropriate modifications were made to the taxonomy to accommodate them. Using the Access database, we were able to quickly obtain data on the distribution of abstracts in each of the sections (see Fig. 2). As a result, we were able to identify some extraneous categories and make the taxonomy more consistent by combining closely related categories into one. The distribution of section assignments, as shown in Figure 2, falls into three broad groups. Sections 1 and 8, which were concerned with basic information science research and electronic information systems, received over 35% of the assignments. A second group of sections (2 through 6), taken together, received about 45%, and the remaining 20% of the assignments fell into one of the other sections. This exercise was extremely helpful and revealing. The abstracts that could not be indexed pointed out several gaps in the preliminary taxonomy and the distribution shown in Figure 2 showed that further refinement of the sections would be appropriate. Indexing Consistency The validation exercise also allowed us to measure consistency among the three indexers, shown in Table 1. Considering the full classification of each item, consisting of both main and subheadings, at least two of the indexers agreed on the classification of 70% of the 3,004 abstracts from these issues of ISA. Because the distinction between many of the subheadings may be rather small, we also looked at the consistencies when only the main headings are taken into consideration. On that basis, consistency rose significantly, with 81% agreement among the three indexers. Figure 3 shows the distribution of the classifications for the 2,428 abstracts where at least two indexers agreed on the main headings. This distribution confirmed our results from Figure 2, showing that some sections could be combined and others should be considered for subdivision. (For example, Fig. 3 shows that Section 9 probably would not contain enough abstracts over time to justify its existence, and so it should be combined with another related section.) A number of studies have observed that indexer consistencies generally do not exceed 50%, and it has not changed significantly in the past 30 years (Leininger, 2000). Sievert and Andrews (1991) found that 71 pairs of duplicate records in ISA had the same descriptors in about 48% of the cases. Reich and Biever (1991) compared the descriptors assigned by different indexers from the same thesaurus in two agriculture databases and found that they averaged between 24% and 45% agreement in different samples. Our results show significantly higher consistencies than those found previously—in 35% of the cases, all three of us agreed on the assignment, and in an additional 45% of the cases two of us agreed. Only in 19% of the cases did all three of us disagree on the main heading assignment of an abstract. It is of interest to note that in this work, we assigned headings based on the abstracts alone, in contrast to most 774 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2003 FI G . 2. R es ul ts of fir st ve ri fic at io n te st . JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2003 775 indexing studies, in which indexing is done from the full article. Despite indexing from the abstracts, the level of agreement among us is significantly higher than has generally been observed in the indexing studies cited above. Some possible reasons are: (1) We all have significant subject expertise in information science, as well as many years of continuous practical experience in the field. (Two of us have each been working in various areas of information science for over 25 years, and the other one of us has been involved with abstracting and indexing for 12 years.) (2) We had the benefit of previous exposure to the material used in this study. Because of our close involvement with and interest in ISA, this research was the second time we had seen either the abstracts or the articles they represented. (3) As we proceeded through the corpus of material and questions arose, we learned from our experiences in choosing terms for the items indexed earlier in the process. (4) Because we used professionally written abstracts in our tests, we could be confident that each abstract clearly identified the main concepts of the articles they represented. Because the abstracts were brief and focused, there was less uncertainty (and, hence, less disagreement) as might have occurred if we had used complete articles. Final Taxonomy Development and Use As we were proceeding through the first taxonomy validation, we began drafting revisions to correct deficiencies that were uncovered. Besides using our practical experience gained in the field and the experience of the validation test, we also examined the concepts propounded in Hawkins’ earlier article and ensured that they were incorporated into the taxonomy. A large part of this work was done empirically because ISA production continued as this research proceeded. We were thus able to take advantage of what authors in the field were publishing at that moment. In the “final” taxonomy (we regard it as final only with regard to this research project), the 13 main headings were consolidated into 11, and a number of the subcategories were realigned as well. We also conducted a second validation test. Two of the authors manually indexed the May, June/July, and August 2001 issues of ISA, containing a total of 1,265 abstracts. The abstracts were copied from the master ISA production database to a separate Microsoft Access database, and the taxonomy was also converted to an Access database. A simple form, shown in Figure 4, displayed the title and abstract fields of each abstract and also provided an interface to the taxonomy, thus facilitating searching the taxonomy and entering the appropriate subject code. This second test resulted in the taxonomy presented in the Appendix, which became the “ISA Taxonomy” at the beginning of 2002. To date, six issues of ISA, containing a total of 2,692 abstracts have been produced using it. Each ISA abstract is required to be assigned to at least one subject classification, and it can be assigned to up to three subject classifications. For the 2,692 abstracts, a total of 3,504 subject classification assignments were made. Their distribution is shown in Figure 5. Figure 5 shows that the abstracts are well distributed over the 11 main sections and subsections of the taxonomy. There are a number of heavily posted sections as well as a number with only a few postings. We examined each of the subsections shown below, all of which display a low number of postings, and concluded that they should remain as discrete subsections (Table 2). Most of these subsections represent significant or growing areas of information science or, because of the focus of the field, they are important, even though not many articles concerning them are currently being published. We must remember that our sample for this test included only 3 months’ worth of data from the then most recent issues of ISA. In one or two cases, our years of experience in the field influenced our decision. We also considered the three most highly posted subsections to determine if they should be subdivided (Table 3). It is hardly surprising that these subsections are highly posted; because they represent the essence of the field, they contain many articles on closely related topics. We therefore decided not to subdivide them.
منابع مشابه
Information Science Abstracts: Tracking the literature of information science. Part 1: Definition and map
Information Science Abstracts (ISA) is the oldest abstracting and indexing (A&I) publication covering the field of information science. A&I publications play a valuable “gatekeeping” role in identifying changes in a discipline by tracking its literature. This article briefly reviews the history of ISA as well as the history of attempts to define “information science” because the American Docume...
متن کاملNEW INFORMATION ON THE FORM-GENUS CYTOSPORA IN IRAN (I)
In order to study of the form-genus Cytospora, around 400 specimens were collected from all over the Iran during 2003-2005. In this paper, the first part of the obtained results, including seven form-species are presented. Cytospora atrocirrhata, C. carbonacea, C. gutnerae, C. nivea and C. rosarum are reported as new members for mycoflora of Iran. Also, Vitis vinifera for C. leucostoma and Celt...
متن کاملCovariance Analysis of a vector tracking GPS receiver based on MMSE multiuser Detection
In high dynamic conditions, using vector tracking loops instead of scalar tracking loops in GPS receivers is proved as an efficient method to compensate the performance. The Minimum Mean Squared Error detector as a multiuser detector is applied in the vector tracking loop for more reliability and efficiency. The Kalman filter does the two tasks of tracking and extracting the navigation data aft...
متن کاملHospital Information System, a Tool for Effective Decision Making of Healthcare Managers
Statistics and information are considered the most important source of power in organizations and the source for all managerial activities. If correct and comprehensive information are readily at hand, incorrect decisions will be reduced to a minimum. The purpose of this study is to review the effects of hospital information system in regards to effectiveness of decisions made by managers. The ...
متن کاملSystematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملکندوکاوی در انعکاس موضوعات کتابداری و اطلاع رسانی در روزنامه های کثیرالانتشار سال 1390
Purpose: The objective of this research is to identify the occurrence rate , publication style and type of topics related to library and information science that were found in widely read newspapers in 1390 (2011). Methodology: In this research, the content analysis was used in order to investigate library and information science topics in widely read Iranian newspapers. The The statistical su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 54 شماره
صفحات -
تاریخ انتشار 2003